AITopics | score 0

Collaborating Authors

score 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DPAIL: Training Diffusion Policy for Adversarial Imitation Learning without Policy Optimization

Neural Information Processing SystemsJun-14-2026, 13:25:17 GMT

Human experts employ diverse strategies to complete a task, producing to multimodal demonstration data. Although traditional Adversarial Imitation Learning (AIL) methods have achieved notable success, they often collapse theses multimodal behaviors into a single strategy, failing to replicate expert behaviors. To overcome this limitation, we propose DPAIL, an adversarial IL framework that leverages diffusion models as a policy class to enhance expressiveness. Building on the Adversarial Soft Advantage Fitting (ASAF) framework, which removes the need for policy optimization steps, DPAIL trains a diffusion policy using a binary cross-entropy objective to distinguish expert trajectories from generated ones. To enable optimization of the diffusion policy, we introduce a novel, tractable lower bound on the policy's likelihood. Through comprehensive quantitative and qualitative evaluations against various baselines, we demonstrate that our method not only captures diverse behaviors but also remains robust as the number of behavior modes increases.

diffusion model, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

LMM-IQA: Image Quality Assessment for Low-Dose CT Imaging

Celik, Kagan, Unal, Mehmet Ozan, Ertas, Metin, Yildirim, Isa

arXiv.org Artificial IntelligenceNov-11-2025

Low-dose computed tomography (CT) represents a significant improvement in patient safety through lower radiation doses, but increased noise, blur, and contrast loss can diminish diagnostic quality. Therefore, consistency and robustness in image quality assessment become essential for clinical applications. In this study, we propose an LLM-based quality assessment system that generates both numerical scores and textual descriptions of degradations such as noise, blur, and contrast loss. Furthermore, various inference strategies - from the zero-shot approach to metadata integration and error feedback - are systematically examined, demonstrating the progressive contribution of each method to overall performance. The resultant assessments yield not only highly correlated scores but also interpretable output, thereby adding value to clinical workflows. The source codes of our study are available at https://github.com/itu-biai/lmms_ldct_iqa.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.07298

Country: Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.89)

Industry:

Health & Medicine > Nuclear Medicine (0.55)
Health & Medicine > Therapeutic Area > Oncology (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring the Utilities of the Rationales from Large Language Models to Enhance Automated Essay Scoring

Jiao, Hong, Choi, Hanna, Hua, Haowei

arXiv.org Artificial IntelligenceNov-3-2025

Exploring the Utilities of the Rationales from Large Language Models to Enhance Automated Essay Scoring Hong Jiao University of Maryland, College Park Hanna Choi University of Maryland, College Park Haowei Hua Princeton University Abstract This study explored the utilities of rationales generated by GPT-4.1 and GPT -5 in automated scoring using Prompt 6 essays from the 2012 Kaggle ASAP data . Essay-based scoring was compared with rationale-based scoring. The study found in general essay -based scoring performed better than rationale -based scoring with higher Quadratic Weighted Kappa (QWK). However, rationale-based scoring led to higher scoring accuracy in terms of F1 scores for score 0 which had less representation due to class imbalance issues . The ensemble modeling of essay-based scoring models increased the scoring accuracy at both specific score levels and across all score levels. The ensemble modeling of essay -based scoring and each of the rationale-based scoring performed about the same. Further ensemble of essay -based scoring and both rationale-based scoring yielded the best scoring accuracy with QWK of 0.870 compared with 0.848 reported in literature. Introduction Automated essay scoring methodology develops along with the advances in AI technology. Starting from the early supervised machine learning models based on engineered features ( e.g., Mahana et al., 2012) to recent use of large language models (LLMs), the methods for automated essay scoring as demonstrated in Appendix A evolved with the advances in machine learning, deep learning, language models, and LLMs. Using automated scoring of Prompt 6 in the Automated Student Assessment Prize (ASAP) dataset from Kaggle, this study intends to explore the utility of rationales generated by LLMs in enhancing automated essay scoring. For the ASAP Prompt 6, automated scoring models have been developed since 2012 after the Kaggle competition.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.27131

Country: North America > United States > Maryland > Prince George's County > College Park (0.44)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry:

Education > Educational Technology > Educational Software > Computer-Aided Assessment (1.00)
Education > Assessment & Standards > Student Performance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Granular Study of Safety Pretraining under Model Abliteration

Agnihotri, Shashank, Jakubassa, Jonas, Dey, Priyam, Goyal, Sachin, Schiele, Bernt, Radhakrishnan, Venkatesh Babu, Keuper, Margret

arXiv.org Artificial IntelligenceOct-6-2025

Open-weight LLMs can be modified at inference time with simple activation edits, which raises a practical question for safety: do common safety interventions like refusal training or metatag training survive such edits? We study model abliteration, a lightweight projection technique designed to remove refusal-sensitive directions, and conduct a controlled evaluation across a granular sequence of Safety Pretraining checkpoints for SmolLM2-1.7B, alongside widely used open baselines. For each of 20 systems, original and abliterated, we issue 100 prompts with balanced harmful and harmless cases, classify responses as **Refusal** or **Non-Refusal** using multiple judges, and validate judge fidelity on a small human-labeled subset. We also probe whether models can identify refusal in their own outputs. Our study produces a checkpoint-level characterization of which data-centric safety components remain robust under abliteration, quantifies how judge selection influences evaluation outcomes, and outlines a practical protocol for integrating inference-time edits into safety assessments. Code: https://github.com/shashankskagnihotri/safety_pretraining.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.02768

Country: Europe > Germany (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)

Add feedback

SAGE: A Realistic Benchmark for Semantic Understanding

Goel, Samarth, Lee, Reagan J., Ramchandran, Kannan

arXiv.org Artificial IntelligenceSep-26-2025

As large language models (LLMs) achieve strong performance on traditional benchmarks, there is an urgent need for more challenging evaluation frameworks that probe deeper aspects of semantic understanding. We introduce SAGE (Semantic Alignment & Generalization Evaluation), a rigorous benchmark designed to assess both embedding models and similarity metrics across five categories: Human Preference Alignment, Transformation Robustness, Information Sensitivity, Clustering Performance, and Retrieval Robustness. Unlike existing benchmarks that focus on isolated capabilities, SAGE evaluates semantic understanding through adversarial conditions, noisy transformations, and nuanced human judgment tasks across 30+ datasets. Our comprehensive evaluation of 9 embedding models and classical metrics reveals significant performance gaps, with no single approach excelling across all dimensions. For instance, while state-of-the-art embedding models like OpenAI's text-embedding-3-large dominate in aligning with human preferences (0.682 vs. 0.591 for the best classical metric), they are significantly outperformed by classical metrics on information sensitivity tasks, where Jaccard Similarity achieves a score of 0.905 compared to the top embedding score of 0.794. SAGE further uncovers critical trade-offs: OpenAI's text-embedding-3-small achieves the highest clustering performance (0.483) but demonstrates extreme brittleness with the lowest robustness score (0.011). SAGE exposes critical limitations in current semantic understanding capabilities and provides a more realistic assessment of model robustness for real-world deployment.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.2131

Country:

Europe (1.00)
North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Concealment of Intent: A Game-Theoretic Analysis

Wu, Xinbo, Umrawal, Abhishek, Varshney, Lav R.

arXiv.org Artificial IntelligenceAug-19-2025

As large language models (LLMs) grow more capable, concerns about their safe deployment have also grown. Although alignment mechanisms have been introduced to deter misuse, they remain vulnerable to carefully designed adversarial prompts. In this work, we present a scalable attack strategy: intent-hiding adversarial prompting, which conceals malicious intent through the composition of skills. We develop a game-theoretic framework to model the interaction between such attacks and defense systems that apply both prompt and response filtering. Our analysis identifies equilibrium points and reveals structural advantages for the attacker. To counter these threats, we propose and analyze a defense mechanism tailored to intent-hiding attacks. Empirically, we validate the attack's effectiveness on multiple real-world LLMs across a range of malicious behaviors, demonstrating clear advantages over existing adversarial prompting techniques.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.20841

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Solving Scene Understanding for Autonomous Navigation in Unstructured Environments

Renji, Naveen Mathews, K, Kruthika, Keshavamurthy, Manasa, Kumari, Pooja, Rajarajeswari, S.

arXiv.org Artificial IntelligenceJul-29-2025

Autonomous vehicles are the next revolution in the automobile industry and they are expected to revolutionize the future of transportation. Understanding the scenario in which the autonomous vehicle will operate is critical for its competent functioning. Deep Learning has played a massive role in the progress that has been made till date. Semantic Segmentation, the process of annotating every pixel of an image with an object class, is one crucial part of this scene comprehension using Deep Learning. It is especially useful in Autonomous Driving Research as it requires comprehension of drivable and non-drivable areas, roadside objects and the like. In this paper semantic segmentation has been performed on the Indian Driving Dataset which has been recently compiled on the urban and rural roads of Bengaluru and Hyderabad. This dataset is more challenging compared to other datasets like Cityscapes, since it is based on unstructured driving environments. It has a four level hierarchy and in this paper segmentation has been performed on the first level. Five different models have been trained and their performance has been compared using the Mean Intersection over Union. These are UNET, UNET+RESNET50, DeepLabsV3, PSPNet and SegNet. The highest MIOU of 0.6496 has been achieved. The paper discusses the dataset, exploratory data analysis, preparation, implementation of the five models and studies the performance and compares the results achieved in the process.

artificial intelligence, machine learning, segmentation, (17 more...)

arXiv.org Artificial Intelligence

2507.20389

Country: Asia > India > Karnataka > Bengaluru (0.35)

Genre: Research Report (1.00)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Multimodal Sentiment Analysis on CMU-MOSEI Dataset using Transformer-based Models

Gajjar, Jugal, Ranaware, Kaustik

arXiv.org Artificial IntelligenceJul-16-2025

This project performs multimodal sentiment analysis using the CMU-MOSEI dataset, using transformer-based models with early fusion to integrate text, audio, and visual modalities. We employ BERTbased encoders for each modality, extracting embed-dings that are concatenated before classification. The model achieves strong performance, with 97.87% 7-class accuracy and a 0.9682 F1-score on the test set, demonstrating the effectiveness of early fusion in capturing cross-modal interactions. The training utilized Adam optimization (lr=1e-4), dropout (0.3), and early stopping to ensure generalization and robustness. Results highlight the superiority of transformer architectures in modeling multimodal sentiment, with a low MAE (0.1060) indicating precise sentiment intensity prediction. Future work may compare fusion strategies or enhance interpretability.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.0611

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.88)

Add feedback

Iterative Self-Improvement of Vision Language Models for Image Scoring and Self-Explanation

Tanji, Naoto, Yamasaki, Toshihiko

arXiv.org Artificial IntelligenceJun-4-2025

ABSTRACT Image scoring is a crucial task in numerous real-world applications. To trust a model's judgment, understanding its rationale is essential. This paper proposes a novel training method for Vision Language Models (VLMs) to generate not only image scores but also corresponding justifications in natural language. Leveraging only an image scoring dataset and an instruction-tuned VLM, our method enables self-training, utilizing the VLM's generated text without relying on external data or models. In addition, we introduce a simple method for creating a dataset designed to improve alignment between predicted scores and their textual justifications. By iteratively training the model with Direct Preference Optimization on two distinct datasets and merging them, we can improve both scoring accuracy and the coherence of generated explanations. Index T erms-- Vision language model, Explainable AI, Image scoring, Self-training, Direct Preference Optimization 1. INTRODUCTION Deep learning is revolutionizing image analysis, enabling automated classification and scoring with enhanced accuracy and efficiency. Examples include disease detection in medical images, defect identification in quality control, and predicting advertising effectiveness.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.02708

Country: Asia > Japan (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.66)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.48)

Add feedback

MaXIFE: Multilingual and Cross-lingual Instruction Following Evaluation

Liu, Yile, Ma, Ziwei, Jiang, Xiu, Hu, Jinglu, Chang, Jing, Li, Liang

arXiv.org Artificial IntelligenceJun-4-2025

With the rapid adoption of large language models (LLMs) in natural language processing, the ability to follow instructions has emerged as a key metric for evaluating their practical utility. However, existing evaluation methods often focus on single-language scenarios, overlooking the challenges and differences present in multilingual and cross-lingual contexts. To address this gap, we introduce MaXIFE: a comprehensive evaluation benchmark designed to assess instruction-following capabilities across 23 different languages with 1667 verifiable instruction tasks. MaXIFE integrates both Rule-Based Evaluation and Model-Based Evaluation, ensuring a balance of efficiency and accuracy. We applied MaXIFE to evaluate several leading commercial LLMs, establishing baseline results for future comparisons. By providing a standardized tool for multilingual instruction-following evaluation, MaXIFE aims to advance research and development in natural language processing.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.01776

Country:

Asia (0.67)
North America > Mexico (0.27)

Genre: Research Report (0.81)

Industry:

Education (0.67)
Leisure & Entertainment (0.45)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback